1. Field of the Disclosure
The present disclosure relates to a speech enhancement apparatus for emphasizing a consonant portion of an audio signal to improve articulation thereof, and a speech enhancement method therefor.
2. Description of the Related Art
Conventionally, a method for improving articulation by amplifying consonants in an input audio signal has been proposed (See, for example, Patent Document 1). However, the signal level of vowels with respect to the signal level of consonants relevant to the amount of masking of consonants by vowels largely changes depending on the utterer, the language and the phoneme even if the consonants are amplified in a manner similar to that of this method. Therefore, if consonants are amplified at a constant amplification factor, it is difficult to improve the articulation of speech when the signal level of the consonants is small. On the other hand, a method for securing the articulation by changing the amplification factor of consonants according to the time expansion ratio of vowels for approximation to an energy balance in the audio signal by natural utterance is proposed (See, for example, Patent Document 2).
Documents related to the present disclosures are as follows:    Patent Document 1: Japanese patent laid-open publication No. JP 2006-203683 A; and    Patent Document 2: Japanese patent laid-open publication No. JP H10-145897 A.
However, the method of the Patent Document 2 has had such a problem that the masking of consonants by vowels is not sufficiently compensated for unless the time expansion ratio of the vowels is raised in the case of consonants whose signal level is small, and therefore, only unnatural speech could be obtained when the time durations of vowels are largely extended to sufficiently amplify the consonants. Further, the methods of the Patent Documents 1 and 2 have had such a problem that the articulation of speech can not be improved as a consequence of a failure in correctly amplifying the consonants since it is difficult to reliably discriminate the consonants and vowels from speech uttered in a real environment despite that the discrimination of consonants and vowels is performed.